Skip to content

bounded_vec: Avoid unnecessary allocation when decoding#713

Merged
mrcnski merged 6 commits intomasterfrom
mrcnski/boundedvec-decode-allocation
Feb 1, 2023
Merged

bounded_vec: Avoid unnecessary allocation when decoding#713
mrcnski merged 6 commits intomasterfrom
mrcnski/boundedvec-decode-allocation

Conversation

@mrcnski
Copy link
Copy Markdown
Contributor

@mrcnski mrcnski commented Jan 26, 2023

Needed for paritytech/polkadot#6603. See paritytech/polkadot#6603 (comment):

You want to ensure that you don't even start try decoding/allocating when the length of the vector is more than the
allowed maximum.

You first decode length of the vector and then early reject if that is too long.

Copy link
Copy Markdown
Member

@bkchr bkchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

% what @ggwpez said.

Copy link
Copy Markdown
Contributor

@KiChjang KiChjang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also probably requires a version bump for substrate to pick up the changes.

@mrcnski
Copy link
Copy Markdown
Contributor Author

mrcnski commented Jan 27, 2023

Also probably requires a version bump for substrate to pick up the changes.

Should I do that in this PR?

Also, I'll try to get to BoundedBTreeMap and BoundedBTreeSet soon, haven't had a chance yet.

Copy link
Copy Markdown
Member

@ggwpez ggwpez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I do that in this PR?

You can also do it in another PR, just thought it would fit in here 😄

@bkchr
Copy link
Copy Markdown
Member

bkchr commented Jan 27, 2023

As the crate is currently lacking test, I would be happy if this pr could start with this ;) We could ensure that we don't consume more data than the compact length.

@mrcnski
Copy link
Copy Markdown
Contributor Author

mrcnski commented Jan 28, 2023

@bkchr There are tests, including for that case -- see too_big_vec_fail_to_decode and too_big_fail_to_decode. Or do you mean we should check whether unnecessary memory is consumed (allocated)? I wasn't sure how to do that, or if such a test was necessary.

@mrcnski
Copy link
Copy Markdown
Contributor Author

mrcnski commented Jan 28, 2023

This PR is already approved, but would like a quick re-approval from someone (to sanity check the last two commits). :)

@ggwpez
Copy link
Copy Markdown
Member

ggwpez commented Jan 28, 2023

I wasn't sure how to do that, or if such a test was necessary.

Dump idea: Just use u32::MAX as Bound and 1MiB as value, then try to decode that. Should fail to allocate if the implementation is wrong.

@bkchr
Copy link
Copy Markdown
Member

bkchr commented Jan 28, 2023

I wasn't sure how to do that, or if such a test was necessary.

let data = something.encode();
let data_input = &mut &data[..];

Something::decode(data_input).unwrap_err();
assert_eq!(data_input.len(), data.len() - Compact::<u32>(data.len() as u32).compact_len());

This should do it. Also some easy tests ensuring that Vec etc encodings are the same as the bounded impls.

There are tests, including for that case -- see too_big_vec_fail_to_decode and too_big_fail_to_decode

Sorry 🙈

return Err("BoundedBTreeMap exceeds its limit".into())
}
input.descend_ref()?;
let inner = Result::from_iter((0..len).map(|_| Decode::decode(input)))?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea if rust handles this in the most efficient way, otherwise we should maybe pre-allocate as optimization.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean this is now discusable with the bound, but for the normal BTreeMap we don't have done this as someone could send you a fake encoded BTreeMap that pretends to be u32::Max in size and then maybe leading to some OOM.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, would have to compile with optimizations and see. At any rate, I don't want to deviate too much from the standard Decode for the unbounded types, at least in this PR.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fine to merge as is.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just another way of calling collect?

Suggested change
let inner = Result::from_iter((0..len).map(|_| Decode::decode(input)))?;
let inner = (0..len).map(|_| Decode::decode(input)).collect::<Result<Vec<_>, codec::Error>>()?;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it is. http://doc.rust-lang.org/1.66.1/std/result/enum.Result.html#method.from_iter

Currently this impl is the same as the unbounded version, apart from the length check. I think it should be kept the same unless there's a good reason not to.

Copy link
Copy Markdown
Member

@bkchr bkchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ty! But let us use the same derives everywhere.

return Err("BoundedBTreeMap exceeds its limit".into())
}
input.descend_ref()?;
let inner = Result::from_iter((0..len).map(|_| Decode::decode(input)))?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fine to merge as is.

///
/// Unlike a standard `BTreeMap`, there is an enforced upper limit to the number of items in the
/// map. All internal operations ensure this bound is respected.
#[cfg_attr(feature = "std", derive(Hash))]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[cfg_attr(feature = "std", derive(Hash))]
#[cfg_attr(feature = "std", derive(Hash, serde::Serialize, serde::Deserialize), serde(transparent))]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about deriving these; Deserialize would need a custom impl to add bounds checks, like we do now with BoundedVec. And the custom impl looks annoying to implement, so I'm thinking we don't add it until there's actually a need for it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bkchr I'm going to go ahead and merge -- these derives can be a follow-up if we really want them.

///
/// Unlike a standard `BTreeSet`, there is an enforced upper limit to the number of items in the
/// set. All internal operations ensure this bound is respected.
#[cfg_attr(feature = "std", derive(Hash))]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[cfg_attr(feature = "std", derive(Hash))]
#[cfg_attr(feature = "std", derive(Hash, serde::Serialize, serde::Deserialize), serde(transparent))]

/// As the name suggests, the length of the queue is always bounded. All internal operations ensure
/// this bound is respected.
#[cfg_attr(feature = "std", derive(Serialize), serde(transparent))]
#[cfg_attr(feature = "std", derive(Hash, Serialize), serde(transparent))]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[cfg_attr(feature = "std", derive(Hash, Serialize), serde(transparent))]
#[cfg_attr(feature = "std", derive(Hash, serde::Serialize, serde::Deserialize), serde(transparent))]

@mrcnski mrcnski merged commit 07127b6 into master Feb 1, 2023
@mrcnski mrcnski deleted the mrcnski/boundedvec-decode-allocation branch February 1, 2023 08:07
@mrcnski
Copy link
Copy Markdown
Contributor Author

mrcnski commented Feb 1, 2023

@KiChjang Looks like I'm not able to publish 0.1.4, you're the only listed owner. Possible to add paritytech/Core devs as an owner?

@KiChjang
Copy link
Copy Markdown
Contributor

KiChjang commented Feb 2, 2023

I've added the core-devs team as the owner for the crate on crates.io.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants